」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 使用 Python 掌握機器學習:基礎與關鍵概念

使用 Python 掌握機器學習:基礎與關鍵概念

發佈於2024-11-04
瀏覽:905

In today's era of Artificial Intelligence (AI), scaling businesses and streamlining workflows has never been easier or more accessible. AI and machine learning equip companies to make informed decisions, giving them a superpower to predict the future with just a few lines of code. Before taking a significant risk, wouldn't knowing if it's worth it be beneficial? Have you ever wondered how these AIs and machine learning models are trained to make such precise predictions?
In this article, we will explore, hands-on, how to create a machine-learning model that can make predictions from our input data. Join me on this journey as we delve into these principles together.
This is the first part of a series on mastering machine learning, focusing on the foundations and key concepts. In the second part, we will dive deeper into advanced techniques and real-world applications.

Introduction:

Machine Learning (ML) essentially means training a model to solve problems. It involves feeding large amounts of data (input-data) to a model, enabling it to learn and discover patterns from the data. Interestingly, the model's accuracy depends solely on the quantity and quality of data it is fed.

Machine learning extends beyond making predictions for enterprises; it powers innovations like self-driving cars, robotics, and much more. With continuous advancements in ML, there's no telling what incredible achievements lie ahead - it's simply amazing, right?

There's no contest as to why Python remains one of the most sought-after programming languages for machine learning. Its vast libraries, such as Scikit-Learn and Pandas, and its easy-to-read syntax make it ideal for ML tasks. Python offers a simplified and well-structured environment that allows developers to maximize their potential. As an open-source programming language, it benefits from contributions worldwide, making it even more suitable and advantageous for data science and machine learning.

Fundamentals Of Machine Learning

Machine Learning (ML) is a vast and complex field that requires years of continuous learning and practice. While it's impossible to cover everything in this article, let's look into some important fundamentals of machine learning, specifically:

  • Supervised Machine Learning From its name, we can deduce that supervised machine learning involves some form of monitoring or structure. It entails mapping one function to another; that is, providing labeled data input (i) to the machine, explaining what should be done (algorithms), and waiting for its output (j). Through this mapping, the machine learns to predict the output (j) whenever an input (i) is fed into it. The result will always remain output (j). Supervised ML can further be classified into:

Regression: When a variable input (i) is supplied as data to train a machine, it produces a continuous numerical output (j). For example, a regression algorithm can be used to predict the price of an item based on its size and other features.

Classification: This algorithm makes predictions based on grouping by determining certain attributes that make up the group. For example, predicting whether a product review is positive, negative, or neutral.

  • Unsupervised Machine Learning Unsupervised Machine Learning tackles unlabeled or unmonitored data. Unlike supervised learning, where models are trained on labeled data, unsupervised learning algorithms identify patterns and relationships in data without prior knowledge of the outcomes. For example, grouping customers based on their purchasing behavior.

Setting Up Your Environment

When setting up your environment to create your first model, it's essential to understand some basic steps in ML and familiarize yourself with the libraries and tools we will explore in this article.

Steps in Machine Learning:

  1. Import the Data: Gather the data you need for your analysis.
  2. Clean the Data: Ensure your data is in good and complete shape by handling missing values and correcting inconsistencies.
  3. Split the Data: Divide the data into training and test sets.
  4. Create a Model: Choose your preferred algorithm to analyze the data and build your model.
  5. Train the Model: Use the training set to teach your model.
  6. Make Predictions: Use the test set to make predictions with your trained model.
  7. Evaluate and Improve: Assess the model's performance and refine it based on the outputs.

Common Libraries and Tools:

  • NumPy: Known for providing multidimensional arrays, NumPy is fundamental for numerical computations.

  • Pandas: A data analysis library that offers data frames (two-dimensional data structures similar to Excel spreadsheets) with rows and columns.

  • Matplotlib: Matplotlib is a two-dimensional plotting library for creating graphs and plots.

  • Scikit-Learn: The most popular machine learning library, providing all common algorithms like decision trees, neural networks, and more.

Recommended Development Environment:

Standard IDEs such as VS Code or terminals may not be ideal when creating a model due to the difficulty in inspecting data while writing code. For our learning purposes, the recommended environment is Jupyter Notebook, which provides an interactive platform to write and execute code, visualize data, and document the process simultaneously.

Step-by-Step Setup:

Download Anaconda:
Anaconda is a popular distribution of Python and R for scientific computing and data science. It includes the Jupyter Notebook and other essential tools.

Download Anaconda from this link.
Install Anaconda:
Follow the installation instructions based on your operating system (Windows, macOS, or Linux).
After the installation is complete, you will have access to the Anaconda Navigator, which is a graphical interface for managing your Anaconda packages, environments, and notebooks.
Launching Jupyter Notebook:

Mastering Machine Learning with Python: Foundations and Key Concepts

Open the Anaconda Navigator
In the Navigator, click on the "Environments" tab.
Select the "base (root)" environment, and then click "Open with Terminal" or "Open Terminal" (the exact wording may vary depending on the OS).
In the terminal window that opens, type the command jupyter notebook and press Enter.

Mastering Machine Learning with Python: Foundations and Key Concepts

This command will launch the Jupyter Notebook server and automatically open a new tab in your default web browser, displaying the Jupyter Notebook interface.

Using Jupyter Notebook:

The browser window will show a file directory where you can navigate to your project folder or create new notebooks.
Click "New" and select "Python 3" (or the appropriate kernel) to create a new Jupyter Notebook.
You can now start writing and executing your code in the cells of the notebook. The interface allows you to document your code, visualize data, and explore datasets interactively.

Mastering Machine Learning with Python: Foundations and Key Concepts

Building Your First Machine Learning Model

In building your first model, we have to take cognizance of the steps in Machine Learning as discussed earlier, which are:

  1. Import the Data
  2. Clean the Data
  3. Split the Data
  4. Create a Model
  5. Train the Model
  6. Make Predictions
  7. Evaluate and Improve

Now, let's assume a scenario involving an online bookstore where users sign up and provide their necessary information such as name, age, and gender. Based on their profile, we aim to recommend various books they are likely to buy and build a model that helps boost sales.

First, we need to feed the model with sample data from existing users. The model will learn patterns from this data to make predictions. When a new user signs up, we can tell the model, "Hey, we have a new user with this profile. What kind of book are they likely to be interested in?" The model will then recommend, for instance, a history or a romance novel, and based on that, we can make personalized suggestions to the user.

Let's break down the process step-by-step:

  1. Import the Data: Load the dataset containing user profiles and their book preferences.
  2. Clean the Data: Handle missing values, correct inconsistencies, and prepare the data for analysis.
  3. Split the Data: Divide the dataset into training and testing sets to evaluate the model's performance.
  4. Create a Model: Choose a suitable machine learning algorithm to build the recommendation model.
  5. Train the Model: Train the model using the training data to learn the patterns and relationships within the data.
  6. Make Predictions: Use the trained model to predict book preferences for new users based on their profiles.
  7. Evaluate and Improve: Assess the model's accuracy using the testing data and refine it to improve its performance.

By following these steps, you will be able to build a machine-learning model that effectively recommends books to users, enhancing their experience and boosting sales for the online bookstore. You can gain access to the datasets used in this tutorial here.

Let's walk through a sample code snippet to illustrate the process of testing the accuracy of the model:

  • Import the necessary libraries:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

We start by importing the essential libraries. pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library.

  • Load the dataset:
book_data = pd.read_csv('book_Data.csv')
Read the dataset from a `CSV file` into a pandas DataFrame.
  • Prepare the data:
X = book_data.drop(columns=['Genre'])
y = book_data['Genre']

Create a feature matrix X by dropping the 'Genre' column from the dataset and a target vector y containing the 'Genre' column.

  • Split the data:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Split the data into training and testing sets with 80% for training and 20% for testing.

  • Initialize and train the model:
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

Initialize the DecisionTreeClassifier model and train it using the training data.

  • Make predictions and evaluate the model:
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
print(score)

Make predictions on the test data and calculate the accuracy of the model by comparing the test labels to the predictions. Finally, print the accuracy score to the console.

In this example, we start by importing the essential libraries. Pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library. We then read the dataset from a CSV file into a pandas DataFrame, prepare the data by creating a feature matrix X and a target vector y, split the data into training and testing sets, initialize and train the DecisionTreeClassifier model, make predictions on the test data, and calculate the accuracy of the model by comparing the test labels to the predictions.

Depending on the data you're using, the results will vary. For instance, in the output below, the accuracy score displayed is 0.7, but it may show 0.5 when the code is run again with a different dataset. The accuracy score will vary, a higher score indicates a more accurate model.
Output:

Mastering Machine Learning with Python: Foundations and Key Concepts

Data Preprocessing:

Now that you've successfully created your model, it's important to note that the kind of data used to train your model is crucial to the accuracy and reliability of your predictions. In Mastering Data Analysis: Unveiling the Power of Fairness and Bias in Information, I discussed extensively the importance of data cleaning and ensuring data fairness. Depending on what you intend to do with your model, it is essential to consider if your data is fair and free of any bias. Data cleaning is a very vital part of machine learning, ensuring that your model is trained on accurate, unbiased data. Some of these ethical considerations are:

  1. Removing Outliers: Ensure that the data does not contain extreme values that could skew the model's predictions.

  2. Handling Missing Values: Address any missing data points to avoid inaccurate predictions.

  3. Standardizing Data: Make sure the data is in a consistent format, allowing the model to interpret it correctly.

  4. Balancing the Dataset: Ensure that your dataset represents all categories fairly to avoid bias in predictions.

  5. Ensuring Data Fairness: Check for any biases in your data that could lead to unfair predictions and take steps to mitigate them.

By addressing these ethical considerations, you ensure that your model is not only accurate but also fair and reliable, providing meaningful predictions.

Conclusion:

Machine learning is a powerful tool that can transform data into valuable insights and predictions. In this article, we explored the fundamentals of machine learning, focusing on supervised and unsupervised learning, and demonstrated how to set up your environment and build a simple machine learning model using Python and its libraries. By following these steps and experimenting with different algorithms and datasets, you can unlock the potential of machine learning to solve complex problems and make data-driven decisions.

In the next part of this series, we will dive deeper into advanced techniques and real-world applications of machine learning, exploring topics such as feature engineering, model evaluation, and optimization. Stay tuned for more insights and practical examples to enhance your machine-learning journey.

Additional Resources:

  • Programming with Mosh

  • Machine Learning Tutorial geeksforgeeks

版本聲明 本文轉載於:https://dev.to/eztosin/mastering-machine-learning-with-python-foundations-and-key-concepts-54di?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 在 Go 中使用 WebSocket 進行即時通信
    在 Go 中使用 WebSocket 進行即時通信
    构建需要实时更新的应用程序(例如聊天应用程序、实时通知或协作工具)需要一种比传统 HTTP 更快、更具交互性的通信方法。这就是 WebSockets 发挥作用的地方!今天,我们将探讨如何在 Go 中使用 WebSocket,以便您可以向应用程序添加实时功能。 在这篇文章中,我们将介绍: WebSoc...
    程式設計 發佈於2024-11-17
  • 大批
    大批
    方法是可以在物件上呼叫的 fns 數組是對象,因此它們在 JS 中也有方法。 slice(begin):將陣列的一部分提取到新數組中,而不改變原始數組。 let arr = ['a','b','c','d','e']; // Usecase: Extract till index ...
    程式設計 發佈於2024-11-17
  • Bootstrap 4 Beta 中的列偏移發生了什麼事?
    Bootstrap 4 Beta 中的列偏移發生了什麼事?
    Bootstrap 4 Beta:列偏移的刪除和恢復Bootstrap 4 在其Beta 1 版本中引入了重大更改柱子偏移了。然而,隨著 Beta 2 的後續發布,這些變化已經逆轉。 從 offset-md-* 到 ml-auto在 Bootstrap 4 Beta 1 中, offset-md-*...
    程式設計 發佈於2024-11-17
  • Numpy 備忘單
    Numpy 備忘單
    Comprehensive Guide to NumPy: The Ultimate Cheat Sheet NumPy (Numerical Python) is a fundamental library for scientific computing in Python. ...
    程式設計 發佈於2024-11-17
  • 你需要像專業人士一樣閱讀科技文章
    你需要像專業人士一樣閱讀科技文章
    在快节奏的技术世界中,并非您阅读的所有内容都是准确或公正的。并非您读到的所有内容都是由人类编写的! 细节可能存在微妙的错误,或者文章可能故意误导。让我们来看看一些可以帮助您阅读科技文章或任何媒体内容的技能。 1. 培养健康的怀疑态度 培养健康的怀疑态度至关重要。质疑大胆的主张,寻找...
    程式設計 發佈於2024-11-17
  • 如何找到一個多維數組中存在但另一個多維數組中不存在的行?
    如何找到一個多維數組中存在但另一個多維數組中不存在的行?
    比較多維數組的關聯行您有兩個多維數組,$pageids 和$parentpage,其中每行代表一個包含列的記錄“id”、“連結標籤”和“url”。您想要尋找 $pageids 中存在但不在 $parentpage 中的行,從而有效地建立一個包含缺少行的陣列 ($pageWithNoChildren)...
    程式設計 發佈於2024-11-17
  • 為什麼 Windows 中會出現「Java 無法辨識」錯誤以及如何修復它?
    為什麼 Windows 中會出現「Java 無法辨識」錯誤以及如何修復它?
    解決Windows 中的「Java 無法識別」錯誤嘗試在Windows 7 上檢查Java 版本時,使用者可能會遇到錯誤「'Java' 無法識別”作為內部或外部命令。 」此問題通常是由於缺少Java 安裝或環境變數不正確而引起的。要解決此問題,您需要驗證Java 安裝並配置必要的環境...
    程式設計 發佈於2024-11-17
  • 儘管檔案存在且有權限,為什麼 File.delete() 會回傳 False?
    儘管檔案存在且有權限,為什麼 File.delete() 會回傳 False?
    儘管存在並進行權限檢查,File.delete() 返回False使用FileOutputStream 寫入檔案後嘗試刪除檔案時,某些使用者遇到意外問題: file.delete() 傳回false。儘管檔案存在且所有權限檢查(.exists()、.canRead()、.canWrite()、.ca...
    程式設計 發佈於2024-11-17
  • 如何有效地從 Go 中的切片中刪除重複的對等點?
    如何有效地從 Go 中的切片中刪除重複的對等點?
    從切片中刪除重複項給定一個文字文件,其中包含表示為具有“Address”和“PeerID”的對象的對等點清單屬性,任務是根據程式碼配置中「Bootstrap」切片中匹配的「Address」和「PeerID」刪除所有重複的對等點。 為了實現此目的,我們迭代切片中的每個對等點物件多次。在每次迭代期間,我...
    程式設計 發佈於2024-11-17
  • 如何在 PHP 中組合兩個關聯數組,同時保留唯一 ID 並處理重複名稱?
    如何在 PHP 中組合兩個關聯數組,同時保留唯一 ID 並處理重複名稱?
    在 PHP 中組合關聯數組在 PHP 中,將兩個關聯數組組合成一個數組是常見任務。考慮以下請求:問題描述:提供的代碼定義了兩個關聯數組,$array1和$array2。目標是建立一個新陣列 $array3,它合併兩個陣列中的所有鍵值對。 此外,提供的陣列具有唯一的 ID,而名稱可能重疊。要求是建構一...
    程式設計 發佈於2024-11-17
  • 如何自訂Bootstrap 4的檔案輸入元件?
    如何自訂Bootstrap 4的檔案輸入元件?
    繞過 Bootstrap 4 檔案輸入的限制Bootstrap 4 提供了自訂檔案輸入元件來簡化使用者的檔案選擇。但是,如果您希望自訂「選擇檔案...」佔位符文字或顯示所選檔案的名稱,您可能會遇到一些挑戰。 更改 Bootstrap 4.1 及更高版本中的佔位符自 Bootstrap 4.1 起,佔...
    程式設計 發佈於2024-11-17
  • 如何在 CSS 盒子上創建斜角?
    如何在 CSS 盒子上創建斜角?
    在 CSS 框上建立斜角可以使用多種方法在 CSS 框上實現斜角。一種方法描述如下:使用邊框的方法此技術依賴於沿容器左側建立透明邊框和沿底部建立傾斜邊框。以下程式碼示範如何實現:<div class="cornered"></div> <div cl...
    程式設計 發佈於2024-11-17
  • 如何在 Pandas DataFrame 中的字串中新增前導零?
    如何在 Pandas DataFrame 中的字串中新增前導零?
    在 Pandas Dataframe 中的字串中加入前導零在 Pandas 中,處理字串有時需要修改其格式。一項常見任務是向資料幀中的字串新增前導零。這在處理需要轉換為字串格式的數值資料(例如 ID 或日期)時特別有用。 要實現此目的,您可以利用 Pandas Series 的 str 屬性。此屬性...
    程式設計 發佈於2024-11-17
  • 您是否應該異步加載腳本以提高網站效能?
    您是否應該異步加載腳本以提高網站效能?
    非同步腳本載入以提高網站效能在現今的Web 開發領域,優化頁面載入速度對於使用者體驗和搜尋引擎優化至關重要。提高效能的有效技術之一是非同步載入腳本,使瀏覽器能夠與其他頁面元素並行下載腳本。 傳統方法是將腳本標籤直接放置在 HTML 文件中,但這種方法常常會造成瓶頸因為瀏覽器必須等待每個腳本完成載入才...
    程式設計 發佈於2024-11-17
  • 如何將 Python 日期時間物件轉換為自紀元以來的毫秒數?
    如何將 Python 日期時間物件轉換為自紀元以來的毫秒數?
    在Python 中將日期時間物件轉換為自紀元以來的毫秒數Python 的datetime 物件提供了一種穩健的方式來表示日期和時間。但是,某些情況可能需要將 datetime 物件轉換為自 UNIX 紀元以來的毫秒數,表示自 1970 年 1 月 1 日協調世界時 (UTC) 午夜以來經過的毫秒數。...
    程式設計 發佈於2024-11-17

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3